-
Notifications
You must be signed in to change notification settings - Fork 722
Extending logic to add_csv_partitions and leveraging catalog_table_input #674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
| serde_info = { | ||
| "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe" | ||
| if serde_library is None | ||
| else serde_library, | ||
| "Parameters": {"field.delim": sep, "escape.delim": "\\"} if serde_parameters is None else serde_parameters, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we missed this method in the previous PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxispeicher, apologies, I am not sure if I already tagged you to this PR or not for review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure either. If so, I completely missed it 🙈
I also thought that I did include it but seems like I forgot about it eventually 😓.
| serde_info: Dict[str, Any] = {} | ||
| if catalog_table_input: | ||
| serde_info = catalog_table_input["StorageDescriptor"]["SerdeInfo"] | ||
| serde_library: Optional[str] = serde_info.get("SerializationLibrary", None) | ||
| serde_parameters: Optional[Dict[str, str]] = serde_info.get("Parameters", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the to_csv method, instead of passing None to the create_ csv_table and add_csv_partitions method, we extract the serde info from the existing catalog table if it exists
| wr.catalog.add_csv_partitions( | ||
| database=glue_database, | ||
| table=glue_table, | ||
| partitions_values=response["partitions_values"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended the test to include add_csv_partitions
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
maxispeicher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you for adding the missing parts 🙂
| serde_info = { | ||
| "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe" | ||
| if serde_library is None | ||
| else serde_library, | ||
| "Parameters": {"field.delim": sep, "escape.delim": "\\"} if serde_parameters is None else serde_parameters, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure either. If so, I completely missed it 🙈
I also thought that I did include it but seems like I forgot about it eventually 😓.
Issue #, if available:
#672
Description of changes:
Added some enhancements to previous PR #673
catalog_table_inputdetails when available to extract serde info from existing catalog tablesadd_csv_partitionsmethodBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.